title: "Notes on A/B Testing (Udacity)" author: Yuanzhe Li date: 2020-02 output: pdf_document linkcolor: blue

Notes on the A/B Testing (Udacity) course.

Lesson 1: Overview of A/B Testing

1.15 Calculating confidence interval (CTR example)

1.17 Null and Alternative Hypothesis, Two-tailed vs. One-tailed tests

The null hypothesis and alternative hypothesis proposed here correspond to a two-tailed test, which allows you to distinguish between three cases:

Sometimes when people run A/B tests, they will use a one-tailed test, which only allows you to distinguish between two cases:

Which one you should use depends on what action you will take based on the results.

If you're going to launch the experiment for a statistically significant positive change, and otherwise not, then you don't need to distinguish between a negative result and no result, so a one-tailed test is good enough. If you want to learn the direction of the difference, then a two-tailed test is necessary.

1.19 Pooled Standard Error

1.21 - 24. Sample Size and Power

1.25 Pooled Example

An pooled example is shown below, notice how the dmind_{min} works (need the lower bond of the 1α1-\alpha level CI >dmin=0.02> d_{min} = 0.02)

pooled example

1.26 Confidence Interval Case Breakdown

Shown below is the how we should consider the decision under varying CI and dmind_{min} cases

CI breakdown

Lesson 2: Policy and Ethics for Experiments

2.1 - 2.7. Four Principles

IRB's four main principles to consider when conducting experimentats are:

2.8 Accessing Data Sensitivity

An example of data sensitivity assessment is shown below

accessing data sensitivity

2.10 Summary of Principles

Lesson 3: Choosing and Characterizing Metrics

3.2 - 3.3 Metric Definition Overview

3.5 Refining the Customer Funnel

An example of defining metrics for Udacity

3.6 - 3.7 Quizes on Choosing Metrics

3.8 Other techniques for defining metrics

3.10 - 11 Techniques to Gather Additional Data and Examples

3.10 techniques for getting additional data

3.11 gather data - udacity

3.12 when there is no data

3.13 Metric Definition: Click Through Example

3.16 - 3.17 Summary Metrics

3.18 - 3.19 Sensitivity and Robustness

3.20 Absolute Versus Relative Differences

3.21 - 3.22 Variability

3.24-25 Empirical Variability

Lesson 4: Designing an Experiment

Outline of lesson 4

4.2 - 4.3 Unit of Diversion Overview

4.4 - 4.5 Consistency of Diversion

4.6 - 4.7 Ethical Considerations

4.8 - 4.9 Unity of Analysis vs. Diversion

4.10 Inter- vs. Intra-User Experiments

In an interleaved ranking experiment, suppose you have two ranking algorithms, XX and YY. Algorithm XX would show results X1,X2,XNX_1, X_2, … X_N in that order, and algorithm YY would show Y1,Y2,YNY_1, Y_2, … Y_N. An interleaved experiment would show some interleaving of those results, for example, X1,Y1,X2,Y2,X1, Y_1, X_2, Y_2, … with duplicate results removed. One way to measure this would be by comparing the click-through-rate or -probability of the results from the two algorithms. For more detail, see Large-Scale Validation and Analysis of Interleaved Search Evaluation.

4.11 - 4.13 Target Population, Cohort

4.16 - 4.18 Sizing Examples

4.20 - 22. Duration vs. Exposure

4.23 Learning Effects

Lesson 5: Analyzing Results

Outline of this section

5.1 - 5.7 Sanity Checks (invariant metrics)

5.8 - 5.9 Single Metric

Carrie gave some ideas of what you can do if your results aren't significant, but you were expecting they would be. One tempting idea is to run the experiment for a few more days and see if the extra data helps get you a significant result. However, this can lead to a much higher false positive rate than you expecting! See the post (How Not To Run an A/B Test) for more details. Instead of running for longer when you don't like the results, you should be sizing your experiment in advance to ensure that you will have enough power the first time you look at your results.

5.10 - 11. Simpson's Paradox

5.12 - 5.15. Multiple Metrics

5.16. Analyzign Multiple Metrics

5.17. Draw Conclusions

5.18. Changes Over Time

Lesson 6: Final Project

As of March 1st, 2020, I have gone through the first five (video) lessons of the course, and will take an indefinite leave of absense from finishing the final project lesson. Some resources are listed as below.

Reference